Two Phase Semi-supervised Clustering Using Background Knowledge
نویسندگان
چکیده
Using background knowledge in clustering, called semi-clustering, is one of the actively researched areas in data mining. In this paper, we illustrate how to use background knowledge related to a domain more efficiently. For a given data, the number of classes is investigated by using the must-link constraints before clustering and these must-link data are assigned to the corresponding classes. When the clustering algorithm is applied, we make use of the cannot-link constraints for assignment. The proposed clustering approach improves the result of COP k-means by about 10%.
منابع مشابه
Extracting Prior Knowledge from Data Distribution to Migrate from Blind to Semi-Supervised Clustering
Although many studies have been conducted to improve the clustering efficiency, most of the state-of-art schemes suffer from the lack of robustness and stability. This paper is aimed at proposing an efficient approach to elicit prior knowledge in terms of must-link and cannot-link from the estimated distribution of raw data in order to convert a blind clustering problem into a semi-supervised o...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کاملWised Semi-Supervised Cluster Ensemble Selection: A New Framework for Selecting and Combing Multiple Partitions Based on Prior knowledge
The Wisdom of Crowds, an innovative theory described in social science, claims that the aggregate decisions made by a group will often be better than those of its individual members if the four fundamental criteria of this theory are satisfied. This theory used for in clustering problems. Previous researches showed that this theory can significantly increase the stability and performance of...
متن کاملLimitations of Using Constraint Set Utility in Semi-Supervised Clustering
Semi-supervised clustering algorithms allow the user to incorporate background knowledge into the clustering process. Often, this background knowledge is specified in the form of must-link (ML) and cannot-link (CL) constraints, indicating whether certain pairs of elements should be in the same cluster or not. Several traditional clustering algorithms have been adapted to operate in this setting...
متن کاملA Semi-Supervised Approach for Kernel-Based Temporal Clustering
Temporal clustering refers to the partitioning of a time series into multiple nonoverlapping segments that belong to k temporal clusters, in such a way that segments in the same cluster are more similar to each other than to those in other clusters. Temporal clustering is a fundamental task in many fields, such as computer animation, computer vision, health care, and robotics. The applications ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2006